DerivBase.hr: A High-Coverage Derivational Morphology Resource for Croatian

نویسنده

  • Jan Snajder
چکیده

Knowledge about derivational morphology has been proven useful for a number of natural language processing (NLP) tasks. We describe the construction and evaluation of DERIVBASE.HR, a large-coverage morphological resource for Croatian. DERIVBASE.HR groups 100k lemmas from web corpus hrWaC into 56k clusters of derivationally related lemmas, so-called derivational families. We focus on suffixal derivation between and within nouns, verbs, and adjectives. We propose two approaches: an unsupervised approach and a knowledge-based approach based on a hand-crafted morphology model but without using any additional lexico-semantic resources. The resource acquisition procedure consists of three steps: corpus preprocessing, acquisition of an inflectional lexicon, and the induction of derivational families. We describe an evaluation methodology based on manually constructed derivational families from which we sample and annotate pairs of lemmas. We evaluate DERIVBASE.HR on the so-obtained sample, and show that the knowledge-based version attains good clustering quality of 81.2% precision, 76.5% recall, and 78.8% F1-score. As with similar resources for other languages, we expect DERIVBASE.HR to be useful for a number of NLP tasks.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

CroDeriV: a new resource for processing Croatian morphology

The paper deals with the processing of Croatian morphology and presents CroDeriV – a newly developed language resource that contains data about morphological structure and derivational relatedness of verbs in Croatian. In its present shape, CroDeriV contains 14 192 Croatian verbs. Verbs in CroDeriV are analyzed for morphemes and segmented into lexical, derivational and inflectional morphemes. T...

متن کامل

Derivational and Semantic Relations of Croatian Verbs

abstract Keywords: derivational morphology, morphosemantic relations, derivational relations, prefixation, semantic relations, Croatian WordNet This paper deals with certain morphosemantic relations between Croa-tian verbs and discusses their inclusion in Croatian WordNet. The mor-phosemantic relations in question are the semantic relations between unprefixed infinitives and their prefixed deri...

متن کامل

DErivBase: Inducing and Evaluating a Derivational Morphology Resource for German

Derivational models are still an underresearched area in computational morphology. Even for German, a rather resourcerich language, there is a lack of largecoverage derivational knowledge. This paper describes a rule-based framework for inducing derivational families (i.e., clusters of lemmas in derivational relationships) and its application to create a highcoverage German resource, DERIVBASE,...

متن کامل

Morphosemantic relations between verbs in Croatian WordNet

This paper deals with morphosemantic relations between Croatian verbs and discusses their inclusion in Croatian WordNet. Morphosemantic relations refer to semantic relations between morphologically related verbs, i.e., between verbs from the same derivational family. A derivational family consists of verbs with the same lexical morpheme grouped around a base form. Generally, a verb with the sim...

متن کامل

The Lemlat 3.0 Package for Morphological Analysis of Latin

This paper introduces the main components of the downloadable package of the 3.0 version of the morphological analyser for Latin Lemlat. The processes of word form analysis and treatment of spelling variation performed by the tool are detailed, as well as the different output formats and the connection of the results with a recently built resource for derivational morphology of Latin. A light e...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2014